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Microwave background temperature and polarization observations are a powerful way to constrain 
cosmological parameters if the likelihood function can be calculated accurately. The temperature 
and polarization fields are correlated, partial sky coverage correlates power spectrum estimators at 
different I, and the likelihood function for a theory spectrum given a set of observed estimators 
is non-Gaussian. An accurate analysis must model all these properties. Most existing likelihood 
approximations are good enough for a temperature-only analysis, however they cannot reliably 
handle a temperature-polarization correlations. We give a new general approximation applicable for 
correlated Gaussian fields observed on part of the sky. The approximation models the non-Gaussian 
form exactly in the ideal full-sky limit and is fast to evaluate using a pre-computed covariance 
matrix and set of power spectrum estimators. We show with simulations that it is good enough to 
obtain correct results at I > 30 where an exact calculation becomes impossible. We also show that 
some Gaussian approximations give reliable parameter constraints even though they do not capture 
the shape of the likelihood function at each I accurately. Finally we test the approximations on 
simulations with realistically anisotropic noise and asymmetric foreground mask. 



I. INTRODUCTION 



The Cosmic Microwave Background (CMB) anisotropics are a powerful cosmological probe as they depend simply 
on the primordial inhomogeneities, content and geometry of the Universe. If the perturbations are Gaussian, the 
full-sky power spectra of the CMB anisotropics and their polarization contain all of the cosmological information. 
Information in the polarization power spectra can help to break degeneracies that are present if only temperature 
information is used, and also helps to reduce cosmic variance uncertainty. Parameter constraints can therefore be 
significantly improved by using polarization information even if the data is significantly noisier than the temperature. 

An accurate joint likelihood analysis of the CMB temperature and polarization data is crucial to estimate cos- 
mological parameters reliably. In principle this is straightforward at linear order if the primordial perturbations are 
Gaussian as the distribution can be calculated exactly. However calculating the likelihood exactly from partial sky 
data with anisotropic noise is computationally prohibitive except at low / because large matrices need to be inverted. 
Most analyses therefore rely on approximations to the likelihood function at high I, using only the information in a 
set of estimators for the power spectra and a covariance estimated (or calibrated) from simulations. An alternative 
approach not considered further here would be to use the Gibbs sampling approach of Ref . p], Q , though this has 
serious convergence problems of its own Q . 

If the likelihood of the theory power spectrum Ci as a function of the measured estimators Ci were Gaussian the 
likelihood could be calculated straightforwardly from the measured C;. However the distribution is non-Gaussian 
because for a given temperature power spectrum Ci, the a sum of squares of Gaussian harmonic coefficients, have 
a (reduced) x-squared distribution. At large I the distribution does tend to Gaussian by the central limit theorem; for 
example the mean and maximum likelihood values of Ci converge as 1/2. However the precision with which we can 
hope to measure the cosmological parameters also improves at 1/Z max , so the relative bias due to the non-Gaussianity 
is potentially independent of I. On all scales the distribution must be modeled carefully to get unbiased cosmological 
parameter constraints. 

The importance of the non-Gaussianity of the temperature likelihood function at low / is well known, and there 
are several well established likelihood approximations to model it 4 7]. Current polarization data only contributes 
interesting information at low I where an exact likelihood can be used (§l-[l0|. however in the future the small-scale 
polarization signal will be less noise dominated and contain useful information. On small scales the likelihood function 
cannot be computed exactly in reasonable time, and the likelihood function is significantly more complicated than for 
the temperature because the temperature and polarization fields are correlated. The only existing attempt to model 
the polarized likelihood function at high I, Ref. 0], relies on variable transformations that are not guaranteed to be 
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well defined, and is untested in practice. We give a new general well-defined likelihood approximation that can be 
used with partial-sky Gaussian polarized CMB data, or any other set of correlated Gaussian fields observed on part 
of the sky. It is exact in the full-sky limit, and can easily be calibrated from simulations. We also discuss under what 
circumstances a Gaussian likelihood approximation is reliable. 

The layout of the paper is as follows. In section [TT] we present a brief overview of the exact full-sky likelihood 
function for isotropic noise, and discuss the accuracy required in general for unbiased parameter estimation. We 
start section Mil with a review of various temperature likelihood approximations available in the literature, discuss 
the accuracy of the various Gaussian approximations, and move on to derive a new general likelihood approximation 
(Eq. (|47|0 that is exact in the full-sky limit. In section ITVl we test the approximations by comparing with the exact 
likelihood function for azimuthal sky cuts and consistency with the binned likelihood. Finally in section [V] we check 
the approximations with realistically anisotropic noise and demonstrate consistent parameter estimation from simple 
Planck-like simulations. Some mathematical and analysis details are described in the appendices: appendix lAl gives 
identities relating expressions with symmetric matrices to expressions with a vector of components; appendix [B] 
calculates the non-Gaussian correction to the full-sky effective chi-squared; appendix Ogives results for the likelihood 
function when using cross-power spectrum estimators from different maps; appendix [D] reviews the basic Pseudo-C; 
estimator and exact likelihood formalism and appendix [E] slightly generalizes previous hybrid Pseudo-C; estimators 
for anisotropic noise and gives details of our Planck-like test simulations. 

We assume Gaussianity and statistical isotropy of the fields, and focus on the idealized case of pure CMB ob- 
servations without the complications of foregrounds, point sources, non-linear effects, anisotropic beams, and other 
observational artefacts. Generalizing our work to more realistic situations will be crucial for application to real data. 
If the fluctuations turn out to be significantly non-Gaussian or anisotropic a more complicated analysis may also be 
required. 



II. EXACT FULL-SKY LIKELIHOOD FUNCTION 



Observations on the full-sky can be decomposed into spherical harmonics Yi m , for example the temperature at 
position f2 can be written 

T(ti) = j2°LYi m m. (i) 
i m 

The polarization field can be expanded analogously in terms of E and B harmonics with opposite parity, see e.g. 
Ref. If the CMB field is Gaussian, as expected in linear theory, the corresponding harmonic components af m , 

af m and af m are Gaussian variables with zero mean. The CMB power spectrum Cj^ Y determines the variance, which 
is independent of m if we assume statistical isotropy, so that 

(\aL\ 2 )=Cr (\af m \ 2 )=Cr (\af m \ 2 )=Cr- (2) 

The temperature and E'-polarization fields are expected to be correlated, so there is an additional correlation power 
spectrum {\(ij m af m *\) = C\ T , but for a parity-invariant ensemble the B-polarization is expected to be uncorrelated 
to the other fields and the other cross-correlation power spectra are zero. 

Since we only observe one sky, we cannot measure the power spectra directly, but instead form the rotationally- 
invariant estimators, C^ Y , for full-sky CMB maps given by 



1 ' 

— a lrn a lrn*- (3) 



21 , .. 

= -l 



The expectation values of these estimators are the true power spectra, (C* Y ) = Cf- Y . 

To keep things general we consider n (correlated) Gaussian fields, and define an rt-dimensional vector a; TO of the 
harmonic coefficients at each / and m. In the case of the CMB a; m = (af m , af m , af m ) T ■ The covariance matrix at each 
I is defined as 

d ee (B, m ajj, (4) 

and the equivalent estimator is 



Cl = 21 



i-^a; ro a| m . (5) 
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Since the a; m are assumed to be Gaussian and statistically isotropic, they have independent distributions (for \m\ > 0) 
and the probability of a set of a; m at a given I is given by 

I 

- 21n(P({a im }|C0) = [ a LCf V m + ln|2^C,|] = (21 + 1) (Tr[C ; Cf x ] + In |C,|) + const. (6) 

m— — ^ 

The fact that this likelihood for C; depends only on the Cf~ Y (components of the matrix C{) shows that on the full- 
sky the CMB data can losslessly be compressed to a set of power spectrum estimators that contain all the relevant 
information about the posterior distribution. In other words Ci is a sufficient statistic for the likelihood. Integrating 
out all the {a/ m } with the same Ci (or normalizing with respect to C{) gives a Wishart distribution 1 for Ci (for a 
thorough review see Ref. [l2l]): 

PiCACl) OC i^^ e -(2/+l)Tr(C lCr 1 )/2^ (7) 

\Cl\ 2 

The likelihood function for Ci given the observed Ci is C(Ci\Ci) oc P(Ci\Ci), an inverted Wishart distribution. It is 
straightforward to show that the likelihood has a maximum when Ci = C/, so C\ is the maximum likelihood estimator. 
When n = 1, for example when only the temperature is considered, the Wishart distribution reduces to 



21nP(Q|Q) = (21 + 1) 



21 — 1 

c l /c l + bx(c l )-— l MCi) 



const. (8) 



Considered as a function of C'i this is a (reduced) %-squared distribution with 21 + 1 degrees of freedom; it has mean 
(Ci) — Ci, but maximum at Ci — Cj(21 — 1)/(21 + 1). This skewness is also apparent in the likelihood distribution 
C(Ci\Ci) oc P(Ci\Ci), which peaks at Ci = Ci but has mean value Ci(2l + 1) / (2Z — 3). The mean value of Ci calculated 
from the estimators should be above the C;, which is why using a quadratic approximation symmetric in Ci (with 
mean at Ci = Ci) potentially biases results by 0(1/ 1) at each I. 

For n correlated Gaussian fields, there are in general n(n + l)/2 distinct power spectra [C;]^ = (oS* a im)' anc ^ on 
the full-sky their estimators have covariance given by 

cov([C ; ] 4J , [CtU) = ^ ([QUd]^ + [CtUQ]^) . (9) 

It is sometimes convenient to work with vectors rather than matrices, so that X; = vccp(C;) is a vector of the 
n(n + l)/2 distinct elements of C;, and similarly for the estimators. The corresponding covariance matrix is Mi = 
((X;— X/)(X/— X;) T ). For symmetric matrices A and B a useful and somewhat unobvious identity is (see Appendix lA| : 

07 _l_ 1 

vecp(A) T Mf 1 vecp(B) = Tr [AC^BCf 1 ] , (10) 

which can be used to relate results involving Ci to results involving X/. In particular by writing Ti'lCiC^ 1 ] = 
Tr[C;C ; _1 CiCf 1 ] we can write the Wishart distribution in terms of the covariance Mi = Af;(X;) as 

21 + 1 21 — 1 

-21ogP(X,|X,) - 2XfM / - 1 X ; + ^-log|M ; | — log |M,| + const (11) 

n + 1 n + 1 



21 + 1 . 21 - 1 



= 2(X i -X) J Mf 1 X + — -log|M«| — log |M Z |+ const, (12) 

where we used log |M;| = (n + 1) log |C/| + const and M t = M ; (X;). 

We now briefly review the standard Bayesian argument to link the function P(d\a) for the data d given parameters 
a, to the posterior P(a\d), the distribution of the parameters given the data. Bayes theorem states that the posterior 
probability of a given the data is: 

P(c*\d) - m p { ^ a) « C(a\d)P(a), (13) 



1 Technically (21 + l)Ci ~ W n (2l + 1, C ; ) 
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where the prior P(a) gives information we already know about the models. In the case of linear CMB power spectra, 
the Ci can be computed essentially exactly from a set of parameters using standard Boltzmann codes. The probability 
distribution function of a set of parameters given observed data {Ci} = d is therefore given on the noise-free full-sky 
by: 

P(a\{Ci}) ex C({Ci(a)}\{Ci})P(a) = \{C{C l {a)\C l )P{a). (14) 

I 

Since the prior depends on the models under consideration, in this paper we analyse the methods for estimating the 
likelihood £({C;}|{C;}), which is the required input to cosmological parameter estimation codes such as CosmoMC 2 . 
When analysing the likelihood function it is often convenient to normalize so that In C — when Ci — Ci, i.e. to use 

- 2\n£({C l }\{C l }) = ^(2Z + 1) {TrfoCf 1 } - In foCf 1 ] - n} . (15) 

l 

The expected value for this log likelihood is about n(n+l)/2 per Z, corresponding to the n(n+l)/2 distinct components 
of Ci. For a more detailed analysis and discussion of 'chi-squared' goodness-of-fit see the Appendix [Bl 

If there are multiple maps, for example from different frequencies and detectors, cross-map Ci estimators can be 
used to avoid noise bias. If a set of cross-estimators is used the exact full-sky likelihood function is somewhat different 
from the above, as discussed in Appendix [Cj However in the limit of many maps the distribution becomes Wishart. 
In the limit in which there are enough maps that the information loss from using only cross-estimators is small, the 
approximations developed in this paper should therefore also be applicable. 

When the underlying fields are non-Gaussian, the analysis in this paper does not apply directly. However in many 
cases it is likely to be a good approximation to use the same likelihood approximations but with the covariance 
replaced with its full non-Gaussian version including 4-point terms. Non-Gaussianity associated with mode-coupling 
(e.g. from non-linear evolution) can also change the effective number of modes at a given scale. For example the 
£?-mode CMB polarization power spectrum is generated by lensing of an E field by a relatively small number of 
lensing convergence modes. This leads to strong correlations between Z, and a drastically reduced number of modes 
compared to Z^ ax expected for Gaussian fields. Ref. 6] have shown that using a likelihood approximation designed for 
analysing Gaussian fields, but allowing for the full covariance from the non-Gaussianity, can give acceptable results. 
They also demonstrate the importance of modeling the non-Gaussianity of the likelihood function accurately when 
analysing fields that depend on a small number of underlying modes. 



A. Required accuracy 



To assess how accurately we need to be able to model the likelihood we need to know how biases on the posterior Ci 
translate into constraints on parameters. The simplest case is instructive: consider estimating an amplitude parameter 
A, where Ci = AC^ for some fiducial fixed spectrum Cf,. For zero noise and a range of I with Z m ; n < Z < Z max , we 
have 



-2\n£(A\{C l }) = £(2i + 1) jilkfCjC/f 1 ] - In faCjJ 1 ] + nlogA - n j 



and the maximum likelihood value is 



If Cf, is the underlying true model then (A) = 1 and the Fisher variance is 



al^-l-^hxCiAm}) 



2 



A=1 n^l + l) n((U* + l) 2 - £J nll^ 



(16) 



n£,(21 + l) ■ 



(18) 
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for a range of 2 satisfying l max ^> 2 min . We therefore need any biases to give AA <C \J (2/n)/2 max in order for the bias 
on A to be small compared to its error bar. If we have an /-dependent bias 8Ci, the bias on A from Eq. (|17l) is small 
compared to its error if 



l(M>| - n£,(a + l) <K 



The tolerated bias scales as 1/(22 + 1), so this criterion will be satisfied for 2 max 3> 2 r 



i|Tr(C r 1 ( 5C i )|«^^ T ^yjy 
n V n 2/ + 1 V 2n I 



(19) 

for any systematic error with 

(20) 

For a multiplicative bias SCi = BiCi this criterion is \B\\ <C (2n)~ 1 ^ 2 /I. Alternatively if Bi is a constant the 
requirement is \B{\ <C \/ (2/n)/2 max . We shall loosely refer to l/(ly/n) as the 'systematic error', and require biases 
to be much smaller than this, which is appropriate for nearly full-sky observations. For a realistic experiment with 
effective sky coverage / s k y the bias can be ~ fskj 2 times larger. 

In the presence of noise the situation is more complicated. For one field with C; — > C\ + JVj , using the Gaussian 
approximation we require 



(22 



(21 + l)Cf 



Y (Q + 7V/) 2 ' 



(21) 



The bias should be smaller than the systematic error ~ 1/2 where Ni <C Ci, but there is greater tolerance where the 
noise is important. 



III. LIKELIHOOD APPROXIMATIONS 



A. Single-field likelihood approximations 

To approximate the likelihood on the cut-sky, the usual approach when analysing the CMB temperature is to 
develop a form for the log likelihood that is quadratic in some function of the Ci , and hence can easily be generalized 
to the cut-sky using an estimate of the Ci covariance matrix. Here we summarize some common approximations in 
their full-sky form. 

At large 2, Eq. ([8]) is approximated by a symmetric Gaussian distribution where the variance is determined by the 
estimators themselves PI: 



2lnC s (Ci\Ci) 



21 + 1 



Q-Ci 
Ci 



(22) 



This approximation is well known to produce a poor fitting to the true likelihood function at low 2 Q ; being symmetric 
it biases posterior Ci low compared to the true likelihood function. Approximating the exact likelihood of Eq. ([8]) 
with a second order expansion in C//C; — 1 gives the same form but with Ci replaced by C\ in the denominator: 



2\nC Q (Ci\d) 



22 + 1 



Ci 



(23) 



This distribution is closer to the true likelihood, being skewed in the right direction, however it is still a poor 
approximation in general, this time biasing the posterior Ci high. It is often somewhat misleadingly referred to as 
the 'Gaussian approximation', even though it does not have the determinant term required for P(Ci\C{) to be a 
normalized Gaussian distribution 3 . Another possibility is 



2\nC f (C l \C l ) 



22 + 1 



Q-Q 



(24) 



3 For this reason we denote it Cq - for a quadratic approximation - rather than Cq used by some other authors. 
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where C/ z is some fixed fiducial model assumed to be smooth and close to the model Ci under consideration. This is 
more interesting as although the shape of the likelihood is wrong at any given I, as we shall see when summed over a 
range of I it can give results consistent with the exact likelihood function. It is equivalent to a Gaussian approximation 
since the determinant term is a constant when using a fixed fiducial model. Adding a Cj-dependent determinant term 
to the quadratic approximation can also produce valid results; we refer to this as Gaussian^, given by 



2 In £ D (C^G 



21 + 1 



Ci-Ci 
Ci 



InlCil 



(25) 



See Section IIII Bl for more details of this approximation. 

Beyond these quadratic/ Gaussian approximations, other approximations that have been used include the log- normal 
distribution where the log-likelihood is quadratic in the log of the power Q 



-21n£ LN (C,|C,) - 



21 + 1 



In 



Q 
Ci 



(26) 



This distribution is also somewhat biased [!, [f| : it only matches the exact full-sky result to second order in Ci/Ci — 1. 

A weighted combination of the quadratic and the log-normal distributions can be a more accurate approximation 
to the exact likelihood, being correct to third order in Ci/Ci — 1. This approximation was adopted in the analysis of 
the one, three and five-year WMAP data at high I @: 



ln£ W MAp(G|G) = iln£ Q (G|G) + ^ln£ Lj v(G|Q). 



(27) 



Ref. [6(] suggest even better approximations of the form 



2lnC(C l \C l )^(2l + l) 



9 (21 



2 \2l + \ 



1/3 



1/3 



21 



21 + 1 



1/3 



+ (l-a)lna, 



(28) 



where a is one (referred to as '—1/3' approximation) or minus one (referred to as 'l/3'-approximation). The value 

~ 1 /3 

a = 1/3 corresponds to taking the distribution of C l to be Gaussian. These approximations are correct to third 
order in C//C; — 1, and also very nearly correct to fourth order. 



B. Gaussian approximation for correlated fields 



For a model Cf, with corresponding full-sky X; covariance Mfi, a Gaussian approximation to the likelihood function 



is given by 



-2hxC f {C l \C l ) = (X l -± l ) T My l 1 pL l -± l )+log\M fl \ 
2l + l r 



-Tr 



(Ci -COCf^id-C^Cf- 1 +(n + l) log|C 



'/jl 



(29) 
(30) 



In the second line we used Eq. (fT0| . If C/ ; is fixed (independent of C/) the determinant factors can be dropped, 
giving the generalization of the approximation for one field given in Eq. (|24[) . It is worth studying this approximation 
more carefully as it turns out to be very good for smooth models even if the shape of the likelihood function at each 
I is not accurate. To see this, consider how the total likelihood varies with a parameter 9, 



t d]n£ f (e\Ci) 
80 



= £(2Z + l)Tr 



(31) 



and compare with the equivalent result for the exact likelihood function 



2 A±Jl = Y^{2l + l)Tv 



dCi 
86 



(32) 
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This will be zero for the maximum likelihood value 0, and if C/ ; oc Ci(&) then will also maximize the approximate 
likelihood function C f . In other words the approximation returns the exact best-fit value as long as the fiducial model 
is proportional to the best-fit model. If the true model and the fiducial model are both smooth functions of I, this 
will often be approximately true locally, even if it is not strictly true everywhere. An error in the normalization of 
Cf l would effect the error bar on 9. However since we can easily choose a fiducial model with fractional difference 

< 0(1/ Vl), this would only be a small fractional error on the error. The numerical values of the log likelihoods 
typically differ by C(ln(Z max )) (assuming the fiducial model is accurate to 0(1/1); c.f. discussion in Appendix IB1 . but 
Cf is otherwise generally a good approximation for smooth models. 

Note that the above comments only apply to the Gaussian approximation using a fixed fiducial model. If instead 
we make the covariance Mi a function of Ci the best-fit model would differ from the exact result due to additional 
terms in the derivative from the change in the covariance with parameters. However the Gaussian approximation 
is still quite accurate, and unbiased in an average sense. To see this first consider the simple case of estimating an 
amplitude parameter A, where the exact result for the best- fit value was given in Eq. (|17[) . or in terms of X; by 



A = 1 



2£ I AX Z M I 



n 



Ei(2* + 1) 



(33) 



where AX/ = X/ — X/. Using the Gaussian approximation with Mfi = Af;(X;) and expanding we instead get the 
best-fit value 

2 



A' = A- 



E; [kXiM^AXt -n(n +l)/2] 



n 



E;(2Z + l)/2 



n£,(2l + l)/2 



• 0(A" 1/2 r 



3/2n 



(34) 



where A; is the size of the range of I under consideration (assuming I 1). The second term has expectation value 

zero in the true model, and typical variation of order 0(A t 1 ^ 2 l~ 1 ). The third term is of order 0(Af 1 l~ 1 ). So in 

almost all realizations with A; > 1J > 1 we have A' = A + 0(A l 1 ^ 2 l~ 1 ). The Gaussian approximation is therefore 
almost certainly good to within the required error of 0(1/ 1) as long as A; 1. However unless A/ is large it won't be 
much better than required: local features are likely to be more problematic than the overall amplitude (determined 
from A/ = / max )- More generally we can consider the expectation of the log likelihood 



2(ln£ / ({X i }|{X ; }) 



E 

l 



{< 



X, -X 



(*)\T 



X 



Tr 



loglM^l} 



compared to the exact result 



2 (ln£({X ; }|{X ; })) t = £(2/ + 1) {Tr[cf Of 1 ] + log \d\} 



(35) 



(36) 



This is however also true of 



The exact mean log likelihood has a maximum at the true model, when X; = X, 
the Gaussian approximation, both when Mfi is for a fixed fiducial model, and also when we allow it to vary with 
parameters Mfi = Mi(Xi). To the extent that C\ are constant in I, so that summing over I effectively averages the 
log likelihood, we therefore expect the Gaussian approximations to be nearly unbiased. 

In the case when Mfi — Af/(X/) the reliability of the Gaussian approximation depends critically on the inclusion 
of the determinant term. For example dropping the determinant, the mean approximate log likelihood for A where 



X, = AX 



(*) : 



IS 



2(ln£ Q (A\{± l })) t 



E 

l 



(21 + l)n (1 - A) 2 n(n + l) 



A 2 



2A 2 



(37) 



For large / ma x the maximum is at A ~ 1 + (n + 1)/Z m ax rather than 1, so we expect A to be biased high by the order 
of the expected error, confirming that Cq is not a good approximation to the likelihood. If a fixed fiducial model is 
used then the determinant does not affect the likelihood, and we have 



-2(ln£ f (A\{X,})) t = 



E 

I 



{(1-AfXf^Mj? 



Tr 



oc (1 - A) 2 + const, 



(38) 
(39) 



which has a minimum in agreement with the exact likelihood function (A = 1) regardless of the choice of fiducial 
model (though the variance of A would be wrong by the order of the fractional error in the fiducial model). 

The case where Mfi — JW;(X;) is harder to analyse, but it is not a good approximation because the covariance is 
then correlated with the C/ (so the contribution of high- fluctuating C/ is down- weighted by larger covariance there) . 
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C. Noise, binning and the Gaussian approximation 

In the presence of isotropic uncorrelatcd noise n/ m with known power spectrum Ni , the observed field ai m + ni m 
is just another Gaussian field with power spectrum C; + N\. The likelihood functions are then exactly the same as 
without noise, where Ci and C; are replaced with their values including noise. 

Consider a toy problem where we wish to constrain the amplitude of the power spectrum A over some range of scales 
over which the power spectrum is flat. If there are n m Gaussian modes, and we estimate the power spectrum in rib 
equal bins, each bin will have v = n m /rib modes. If each mode has independent Gaussian noise with known variance 
N, each Cb estimator then has a x 2 distribution with v degrees of freedom and mean A + N. The posterior mean of 
A will differ systematically from its maximum likelihood Cb — N by ~ (A + N)/v, which we can take as an estimate 
of the bias obtained in each bin by using a Gaussian approximation. Using all the bins we can constrain A to within 
an error of ~ (A + N) / yfn m - The criterion for the bias to be much smaller than the error bar is then rib <C y/n m . 
Perhaps surprisingly this is independent of the noise: when this inequality is violated a Gaussian approximation would 
be biased for a given bin, even if the signal is noise dominated. Of course if the bin width is increased so that the 
signal to noise in each bin remains constant, then the Gaussian approximation for the binned estimates does improve 
as the noise increases. 

In the case of observations of the CMB over a fraction / s k y of the sky, with useful signal at < / < l max , the 
number of modes is n m ~ /sky(^max — ^min); so f° r the Gaussian approximation to be good for each bin we need the 
number of bins rib <C /g/^max (assuming /^ ax ^> /^j n ). This is violated by the natural full-sky binning into / max 
bins, one at each I (which has optimal /-resolution) , regardless of how large Z max is. For partial sky observations with 
bin-width Aj in I, you would need Aj 3> / s ky^ 2 f° r the Gaussian approximation to be reliable. However often we 
do not actually need each bin to be individually unbiased, so this criterion can in practice be relaxed. 

Binning different is together makes the distribution more Gaussian, so binning full-sky C'i into bands of width 
A^ 3> 1 would allow any of the quadratic likelihood approximation to be used with very small bias at each bin. For 
basic vanilla models it is straightforward to assess the impact of binning on parameter constraints: we generated a 
toy full-sky simulation at Planck sensitivity [13| , generated samples of the posterior parameter values from the exact 
likelihood function using CosmoMC [l4[ , and then importance sampled using the exact likelihood function on binned 
values of the C\ (keeping the I < 30 spectrum un-binned where in realistic cases the likelihood could also be calculated 

exactly). Using the quadratic approximation Cq in this case (with A^ = 1) biases parameters like the spectral index 
by around 1-sigma compared to the exact result; however using Cf with a sensible fiducial model produces unbiased 
constraints (see previous subsection). Binning with a width A^ = 50 degrades parameter error bars by only < 10% 

for basic models; this would be sufficient to make the bias a tiny fraction (~ 1/A; ') of the error bar on each bin. 
Bins of A/ ~ 10 would likely be wide enough to render the error from a quadratic likelihood approximation small 
relative to other systematic errors. The cost of doing this is that some /-resolution of the acoustic peak structure is 
lost, and any non-standard models with features that vary over a few I could not be analysed reliably (for example 
see Ref. pj|). 

As we shall show, modelling the non-Gaussian distribution accurately is straightforward, and in any case a Gaussian 
approximation is often adequate, so for full-sky observations there is no need to degrade the data by binning. Note 
that binning may however be useful for other reasons, for example to increase the accuracy with which the covariance 
can be estimated from a fixed number of simulations, or to improve the optimality of the cut-sky Ci estimator. Since 
almost all theoretical power spectra are very smooth in I, binning is likely to lose little information as long as the bins 
are narrow compared to the width of any features. 



D. Partial Sky Likelihood function 



When observations are obtained over part of the sky, or part of the sky is obscured by foregrounds or there is 
anisotropic noise, the maximum-likelihood estimators Ci can no longer be measured directly. The CMB is still 
expected to be Gaussian however, so in principle there is an exact pixel-based likelihood function of the form 

e -p T C p _1 p/2 

C({Ci}\p) ex 1/2 , (40) 

where p is a vector of pixel values and C p is the pixel-pixel covariance (a function of {C{\). Equivalently the CMB 
fields can be expanded in a set of modes that are orthogonal and complete over the observed sky, and the likelihood in 
terms of these mode coefficients will also be Gaussian [8|, [ll| [I?} ■ Neither likelihood function can be expressed solely 
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in terms of a set of maximum-likelihood power spectrum estimators, so an optimal analysis does not allow radical 
compression. The problem with using the exact likelihood function is that the number of pixels goes like lf nax , so the 
Cholesky decomposition required to calculate C~ 1 p will scale like ^ ax , which is prohibitive for Z max larger than a few 
hundred and slow for / > 30. Gibbs sampling methods avoid doing large matrix inversions, but still have exponential 
convergence problems if an exact analysis is attempted for general Ci . A sensible strategy is therefore to use an exact 
likelihood only at low I where it is numerically feasible, and to use an approximate analysis at higher I [H, The 
most obvious way to do this is to compress the high-Z data into a set of cut-sky power spectrum estimators, and then 
find an approximate likelihood function that is a function only of these estimators. There is some evidence that doing 
this is close to optimal, and it has the advantage of being fast. This means that numerous practical complications 
can be accounted for simply by adding additional terms to the covariance matrix estimated from simulations. 

There are various possible estimators for the cut-sky power spectrum that can be used, varying from maximum 
likelihood to a variety of quadratic estimators. At high I quadratic estimators can be close to the maximum likelihood 
and we focus here on the widely used Pseudo-C; methods [H, l20l - |26j that are in many cases equivalent to methods 
based on correlation functions [23, [28| . In principle the statistical distribution of these estimators could be calculated 
exactly (2lj . but only at prohibitive numerical cost in general. We therefore look for a fast likelihood approximation 
that is a function only of the set of cut-sky estimators {C;}, an estimate of their covariance (e.g. from simulations or 
calculated), and knowledge of the noise contribution {iV;}. One of the aims of this work is to quantify whether such 
a likelihood approximation is good enough to obtain reliable and nearly-optimal parameter constraints. As our guide 
for modeling the non-Gaussian shape of the likelihood function we will use the known form in the full-sky limit; we 
aim for our approximation to be exact when the {C;} are calculated on the full-sky with isotropic noise. 



E. New likelihood approximation for correlated fields 



We now derive a new likelihood approximation that can be used with Ci estimators calculated from correlated 
Gaussian fields. It is exact on the full-sky, and should give reasonable results even for non-standard models that are 
not necessarily very smooth functions of I. The approximation involves a fiducial model so that the covariance can 
easily be pre-computcd. However errors in the fiducial model are automatically corrected, in that the result remains 
exact on the full-sky however wrong the fiducial model is. We assume that the matrix of estimators C/ is positive 
definite, which may break down for some estimators at low I. 

Given the observed estimators C; for the covariance of n Gaussian fields, the full-sky likelihood function can be 
written 



2 log C(Ci\Ci 



CiCf 



togjC^Cil-n} 



(2/ + l){ 

(21 + 1) {Tr [cf 1/3 6iCi 

(2i + i)5^[A,«-iog(A,«)-i]. 



-1/2 



-iog|cr i/2 c,cr 1/2 i-»l 



(41) 
(42) 
(43) 



The symmetric form is defined using the Hermitian square root and C l 1 ^ 2 CiC l = UiDiUf for orthogonal Ui 
and diagonal L>i. In the presence of instrumental noise the C/ and Ci should include the noise variance. 
To generalize to the cut-sky we want to make this look quadratic, so we write 



21og£(C,|C,) = 



21 



)] 2 = ^±±Tr [g{ Dl f 



(44) 



where 



g{x) = sign(a; - l)y/2(x - ln(x) - 1), 



and [g(L>i)]ij — g(Di t a)Sij. Although the sign of the function is irrelevant for consistency with the exact full-sky 
result, this choice ensures consistency with the Gaussian approximation and that g(x) is a smooth function at x = 1. 
We now want to relate this quadratic form to a version that is quadratic in the matrix elements. To do this we use 
Eq. (fTI?)) in the form 



21 + 1 



Tr 



( C fl 1/2 Cgi C fi 



-l/2\2 



= XjM f 7 1 X 



fi 



^9V 



(45) 
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where X ff/ = vecp(C ff; ) (dimension n(n + l)/2) is the vector of distinct elements of C g ,, and Mf, is the covariance 
of X evaluated for C; = Cf,. We therefore write the exact result of Eq. (|4"4"|) as 

= X g jM f - 1 X 9l , (46) 

where C g[ = C f 1 ^ 2 Uig{Di)Uf C f 1 ^ 2 for some fiducial model Cf,. This can then be generalized to our final cut-sky 
approximation where the estimators at different I may be correlated: 

- 2bg£({C,}|{6,}) « XjM/^X, = £[X a ]f [M / - 1 ] !r [X fl ]r. (47) 

Here iW/ is the fiducial model covariance block matrix with n(n + l)/2 x n(n + l)/2 blocks labeled by / and /', and 
X 5 is a (/max — /min + l)n(n + l)/2-row block vector: 

[M^h- = ((X ! -X i )(X r -X ! ,) T ) / (48) 
[X g }t = vccp (C)f ff [Cr 1/2 aCr 1/2 ]C}( 2 ) , (49) 

where the matrix function g applied to a symmetric positive definite matrix is defined by application of g to its 
eigenvalues. On the full-sky with isotropic noise [Mf]ii> = Su'Mt, and the approximation is exact. It is fast to 
evaluate because Mf -1 is independent of C/ and hence can be pre-computed. Remaining diagonalizations on the 
small matrices at each / are fast. In principle the fiducial model Cf, could also be chosen to be equal to Ci or 
C;, but for most purposes using a fixed smooth theoretical fiducial spectrum that is a good fit to the data is likely 
to be most convenient. For a general correlation structure Mf has [(/max — /mm + l)n(n + l)/2] 2 elements (but is 
symmetric). Remember that here C; and Ci include the noise contribution, so for a pure-theory (zero-noise) C* h the 
approximation requires an (effective) noise TV/ at each /, a covariance matrix, and the set of estimators {Ci}. 

If Ci is block diagonal, as in the case of CMB polarization with B modes, the exact full-sky likelihood is separable in 
the blocks. On the cut-sky the estimators for the blocks may however be correlated; in particular a sky cut will correlate 
E and B-mode polarization estimators. The approximation can be applied with full [(/max — Zmin + l)n(n + l)/2] 
vectors, or the approximation can be applied to a truncated vector including only terms in each block. For example we 
could use X; = [Cf T , Cf E , C EE , C BB ] T , with covariance allowing for correlations between E and B power spectra, 
but ignoring any potential information in components like Cf B (the full-sky likelihood is independent of Cf B , but this 
may not be the case when there are couplings between T, E and B). If the smaller vector is used the transformation 
to X g can be calculated for each block separately. 

For a single Gaussian field the approximation is simply 

- 21og£({C,}|{C,}) w Y^iCi/C^Cf^Mf-^u-iC^giCv/Cv)). (50) 

w 



21og£(C ; |C / ; 



U + 1. 



-Tr 



(Cf, 



1. Generalization 

On the full-sky, and in some generalizations, the distribution of the estimators Ci scales approximately with Ci, so 
that P(Ci\Ci)dCi — Si(Ci/Ci)(dCi)/Ci for some function S(x). The full-sky likelihood function considered above is 
of this form. In general S(x) can differ from the full-sky form, and could be estimated approximately from simulations 
using a given fiducial C/. The likelihood function is then given by £(Ci\Ci) cx 5;(C;/C;)/Cj. We can then use the 
same likelihood approximations as above, where for each / 



g(x) = sign(x - x m )\ 2a 2 log 



Xra Si {x ra ) 



xSi(x) 



(51) 



is the value of x that maximizes xSi(x), and a 2 — var(x) (on the full-sky x m = 1, a 2 = 2/(2/ + 1)). With multiple 



— 1/2 

fields a similar argument applies as long as the likelihood function can be written in terms of C l CiC. 



-1/2 



The 



function Si(x) can then be estimated from the distribution of the diagonal elements of C ; C;C ; 1/ ' at fixed C\. 

The exact distribution of single-field pseudo-Czs is discussed in Ref. pH | for azimuthally symmetric sky cuts. Even 
in this simple case with no noise the marginalized distribution at each / is of a different functional form from the 



-1/2 
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full-sky result, similarly for the corresponding C/-estimators. Using Pseudo-Cj estimators with our approximation 
using g(x) = sign(x — l)y/2(x — \n(x) — 1) amounts to approximating the marginalized distribution of the Cj as x 2 
with vi degrees of freedom, where vi = 2Cf /var(C;). At high I and for small cuts with uniform weigh ting outside the 
cut vi ~ (21 + l)/ s 2 ky [13]; for binned estimators that are nearly uncorrelated, v\ ~ (21 + 1)A;/ S k y (22l. l30j] . 



2. Gaussian approximation 

The Gaussian approximations of Section IlII Bl generalize straightforwardly to a (' max — i mm + l)n(n + l)/2-vector 
of cut-sky estimators X with a covariance matrix Mf, 

- 21og£ / ({X ! }|{X z }) = (X - X) T M7 1 (X - X) + log \M f \. (52) 

Note that even with no correlations between I this cannot be written as a matrix variate normal distribution in the 
form of Eq. (|30| because a general M has many more degrees of freedom than the exact full-sky matrix where Mi (a 
symmetric n(n + l)/2 x n(n + l)/2 matrix) can be expressed in terms of the smaller matrix C; (an n x n symmetric 
matrix). From the discussion in Section [ill Bl we expect the Gaussian approximations to be accurate for ' max 3> 1 in 
almost all cases where parameter variations produce changes that are smooth in I. 



IV. TESTING THE LIKELIHOOD APPROXIMATIONS 



For accurate parameter estimation we need to be able to constrain the theory C; accurately as a function of I 
given the estimators Ci . On the full-sky the likelihood approximations can easily be compared to the exact likelihood 
function. We fit an amplitude parameter A where £(.A|{Ci}) = C({Ci — AC[ n }\{Ci}), over some range of ' using 
some fiducial model C™. The Ci are simulated using C z m , so that on average the best- fit value of A is A = 1. 
Since in almost all models the theory power spectra Ci are smooth functions of ', and we wish to check that off- 
diagonal correlations are being accounted for correctly, we chose to fit over a range Al = 10 in I. This was done for 
I = ('min — >• 'max = 'miii + Al — 1), i.e., bins of size Al with i m ; n and 'max being the lower and upper values of ' in each 
bin, respectively, as a function of ' m ; n - 

Using a standard search routine 4 , we searched for the best fit value of A, A. In other words, for the exact likelihood 
and each approximation, we numerically extracted the amplitude that would maximize the likelihood. We then 
estimate the variance of this estimated maximum likelihood value of A compared to the true maximum likelihood in 
that realization, ((A4 — AE xa ct) 2 )simuiations- This gives a measure of any error introduced by the approximation. Note 
that since we are using a range of = 10 in I, the best-fit value of A depends on the likelihood approximation at 
each I value, and in particular probes the full range of deviations of Ci from C; expected from cosmic variance. 

To quantify whether an approximation is good enough, we consider how well we need to know the amplitude of the 
Ci as a function of ' to get unbiased results on an amplitude parameter. We consider the noise-free case. The cosmic 

variance error on a single ', is y ( 2 i+i) ' ^ mce we are averaging over a range Al — 10, the cosmic variance error we can 
obtain on A from a single bin will be reduced by a factor of Al, hence a fractional error of ~ \J (21 in +i)A; f rom onc 
band. However as discussed in Section III Al for unbiased results from the full spectrum we need a fractional average 
systematic error on the Ci much smaller than AC1/C1 <C n -1 / 2 //. We therefore require likelihood approximations 
that give values that are unbiased to better than the systematic error. 



A. Full-sky tests 

On the full-sky with isotropic noise estimators at different ' are uncorrelated: the likelihood function is £(A|{C;}) = 
Ili=r £(Cz = ACi n \Ci), where C can take the form of the exact likelihood or any of the approximations described 
in section Ull Al 

In Fig. [T] we show the results for the temperature likelihood approximations on the full-sky. We calculate on average 
over simulations the difference between the posterior amplitudes, \(Ai — ^Exact)! (to probe bias) and the variance 



4 Fortran 90 numerical recipes: Golden Section Search. 
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(\Ai — ^Exact| 2 ) (to probe posterior differences in each realization). We require both quantities to be smaller than 
1/7, where Aj is the best-fit value from one of the likelihood approximation given in section UlI Al As expected, the 
symmetric Gaussian distribution Cs shows a very poor fitting as its variance is larger than the systematic error. The 
quadratic approximation Cq gives results almost identical to the systematic error and hence is not a good enough 
approximation. The Gaussian^ results are probably good enough, but the WMAP-approximation and approximation 
developed in Ref.@ are much better. The fiducial Gaussian approximation is exactly unbiased in this simple test and 
is not shown. Any of these last four approximations should be adequate for temperature parameter estimation, at 
least assuming cut-sky accuracy with realistic noise follows the full-sky behaviour. The new likelihood approximation 
by construction is also exactly correct in this full-sky case. 




FIG. 1: The plot compares various likelihood approximations on the full-sky for the case of a single field (temperature only) 
and no noise. The left-hand panel shows the difference between best-fit posterior amplitude of a A; = 10 bin with the likelihood 
approximations and the exact likelihood over 10000 simulations where AExact is the best fit amplitude of the exact likelihood 
and As, Aq, Ad and Awmap are the best fit amplitude of the symmetric Gaussian, quadratic, Gaussian d and WMAP 
approximations respectively. The right-hand panel shows the root-mean-square difference. These two quantities are compared 
to the systematic error tolerance. Only the symmetric Gaussian and the quadratic approximations are clearly not good enough. 
The fiducial Gaussian and new likelihood approximations are not show as they are exactly unbiased in this simple test case 
with correct fiducial model. 
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B. Cut-sky tests 



We now move on to test the approximations on the cut-sky. In particular we want to check that any bias on 
parameter constraints is much smaller than the posterior error, and that the likelihood function has the right shape. 
To do this we calculate simple Pseudo-C; estimators for azimuthal cuts with isotropic noise where the exact likelihood 
function can also be computed in reasonable time. Although idealized, realistic cuts are often approximately azimuthal 
due to the disc-shape of the galaxy, and consistency in this simple case is clearly necessary (if not strictly sufficient) 
to justify the use of a given likelihood approximation. An azimuthal cut introduces most of the qualitative differences 
in a cut-sky analysis, namely correlations between different I and not-exactly Wishart distributions of the Cj. The 
detailed derivations of the Pseudo-C; estimators, the covariance matrix and the exact likelihood for correlated fields 
are reviewed in Appendix|DJ We test the more general case of anisotropic noise and asymmetric cuts later in SectionfVl 



1. Single-field Results 




FIG. 2: Single field likelihood approximation results for the likelihood as a function of bin amplitude, A. The plot compares the 
likelihood approximations to the exact likelihood for an azimuthal galactic cut with / s k y = 0.862, Z max = 600 and bin located 
at 200 < / < 209 for one realization. 

The approximations used in the analysis of the temperature power spectra in the cut-sky are given below, where 
Ci are taken to include noise and \M~ x \iv is the inverse of the covariance matrix [M = JW(X), M = M(X)] when 
only a single field is considered: 

-21nW = ^(^- Q )[ M_ Vfe-C^ +H^l n ^Q[M-V^'ln('^Y (53) 
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21n£_ 1/3 - 9]T(C7 1/3 - Cr 1/3 )C4 1/3 [M-V^/ 3 C F ((7 ; 7 1/3 " C^), 
w 



(54) 



2\nC D = Y,[Ci-Ci)[M- 1 ] w [Cv - C v ) + log |M 



(55) 



(56) 



where [Afy]/j/ is the covariance of some fiducial model, similar to the one used in New Likelihood (see Eq. (1501) ). 

Fig. [5] shows the exact likelihood and the approximations presented in this subsection as a function of the posterior 
amplitude for a bin in one simulation. We consider both cases of noise-free and noisy power spectra. The approxi- 
mations compare well to the exact result in both cases, though the results for the Gaussian approximations are not 
the right shape far away from the peak. Simulations were performed for azimuthal cuts with / s k y = 0.862 5 . We have 
also fixed Ci at I < 30 to Ci — Ci to prevent occasional negative values in the simulations. 



2. Correlated- field Results 




5 that is a Galactic cut of 20°. 
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FIG. 3: The likelihood as a function of bin amplitude, A, for the temperature and polarization fields in one realization. The 
black (solid) line is the exact likelihood, the red (dotted) line is the new likelihood and the blue (dashed) line is the fiducial 
Gaussian distribution. Unlike the fiducial Gaussian distribution which only agrees well around the peak, the new likelihood 
captures the shape of the exact one well. We used an azimuthal cut with / s k y = 0.862, i max = 500 and bin at 150 < I < 159. 
Noise is isotropic and uncorrelated and the E and B modes noise is twice the T noise. 




FIG. 4: The difference between the average of the posterior amplitude and the true input model compared to the systematic 
error (red (solid) line). The green (long-dashed) and the black (dashed) lines represent the differences for the new likelihood and 
fiducial Gaussian, respectively. The curves clearly do not show any significant bias in the posterior amplitudes. The averages 
were taken over 5000 simulations (realizations) for Z max = 800. Simulations were performed for spin-0 T and _E-mode only and 
for azimuthal cuts with / s k y = 0.862 and a bin-size set to 10. 



To obtain unbiased results on an amplitude parameter from n noise-free correlated fields we need the systematic 
fractional bias on the amplitude to be <C l/ly/n. With more than one field there is of course a lot more freedom 
than simply a change in amplitude. Nonetheless it is a useful first test as many important parameters, such as those 
governing the primordial power spectrum, affect the C; essentially through an i-dependent scaling. If there is an 
apparent systematic error SCi in the C\ spectrum, the criterion for an unbiased amplitude is Tr[Cf 1 SCi]/n -C 1/ly/n. 
Since in practice polarization observations are likely to be noise dominated compared to the temperature for the near 
future, any approximation that satisfies this criterion will be more than adequate. We should however also test 
for accuracy of the likelihood to other changes in the spectrum, for example the degree of cross-correlation, as an 
amplitude scaling is a very special (if relevant) case. 

We first test the approximate likelihood function compared to the exact result (see Eq. (|D34[) for exact likelihood 
function used); the result is shown in the Fig. |3] The new likelihood approximation compares quite well with the 
exact likelihood, though it is slightly broader due to the loss of information from compressing the data into a set of 
pseudo-C/ power spectrum estimators C;. The fiducial Gaussian approximation shows significant deviations from the 
shape of the exact likelihood far from the peak. 

For a quick analysis, the tests in the rest of this section were performed for spin-0 T and i5-mode only, i.e. the 
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i?-polarization was simulated as a scalar field similar to temperature so that E-B mixing may be ignored (but T-E 
correlations correctly accounted for). For all simulations, we also fix Ci at I < 30 to Ci = C/ to avoid negative 
estimators and use a bin-size of A; = 10. 

The first consistency check is that on average over simulations \(A) — 1| <C 1/ly/n: this is sufficient to check that 
there is no significant bias in the posterior amplitude. We ran simulations for an azimuthal cut with f s ^ y = 0.826 with 
the results shown in Fig. [4] The new likelihood and the fiducial Gaussian approximations appear to be unbiased. 

We can also check the consistency of the likelihood function by comparing the binned and un-binned likelihood: as 
discussed in Section llll Cl the likelihood function for bins with ^> 1 should be accurately Gaussian. For a smooth 
power spectrum binning can be performed with very little loss of information, and so the likelihood P({Cb]\{Cb]) 
can be calculated essentially exactly in the Gaussian approximation. We can check that this is consistent with the 
likelihood approximation evaluated using each Z; if it is, then we are using the information in the Ci essentially 
optimally, at least when the spectrum is very smooth (even if compressing the sky into a set of C'j-estimators is 
not optimal). Similar to the full-sky single-field analysis, we calculate on average over simulations the difference 
between the posterior amplitudes, \(A a — Ab)\ and the variance which is the square of the difference, (\A a — At\ 2 ). 
We again require both quantities to satisfy the criterion set earlier, i.e. \(A a — Ab)\, (\A a — A^ 2 ) 1 / 2 <C 1/l^/n. Fig. [5] 
compares fiducial Gaussian, binned fiducial Gaussian, new likelihood, binned new likelihood and Gaussian^. The 
plot clearly demonstrates that these approximations would produce the same results and are good enough to be used 
in analysing CMB data. Fig. [5] shows the comparison between the Cs approximation (Gaussian with variance given 
by Ci), binned Cs Gaussian, new likelihood and binned new likelihood. This shows that Cs is strongly biased when 
used with un-binned estimators, but when the data is binned it can produce consistent results as expected. 

The Gaussian approximation with varying covariance, Gaussian^, is significantly slower to compute than the other 
approximations. It is compared to the fiducial-model Gaussian in Fig. [7] for a small number of simulations. Since the 
fiducial- Gaussian result is unbiased this shows that Gaussian^ is also unbiased to good enough (though not excellent) 
accuracy in this case. 



V. PARAMETER ESTIMATION TESTS WITH ANISOTROPIC NOISE 



So far we have been using azimuthally symmetric cuts and assuming that the noise is isotropic. Isotropic noise is 
particularly simple case because the variance of the Ci estimators scales as oc (C; + Ni) 2 to a good approximation. 
When the noise is anisotropic, as in realistic observations, this is no longer the case in general, and it is important 
to test the likelihood approximations in this more realistic situation. For example using a fiducial model covariance 
in our approximation of Eq. (|47|) was motivated in the case where everything is a function only of (C/ + JVj). In 
general it may be necessary to instead evaluate the covariance for each theoretical model to correctly account for the 
more complicated scaling of the covariance with the signal. This could be done for example by re-scaling a sum of 
covariance matrices calculated for noise-only, signal-only and signal plus noise realizations in some fiducial model. 
Although perfectly tractable, we shall see that in the case of Planck the simpler fiducial model approximation appears 
to be adequate. 

We test the likelihood approximations by performing parameter estimation using single sky maps simulated corre- 
sponding to an idealization of the combined Planck 143Ghz channels with 7arcmin symmetric Gaussian beam [l3j . 
The Planck satellite scanning strategy samples points near the ecliptic poles more densely than near the equator, and 
so there is a large (~ 100 factor) range of noise values across the sky 31|. In addition we use the 'kp2' map 6 [HJ 
as a semi-realistic sky cut to simulate masking out the galaxy and point sources. Details of our simulation, hybrid 
Pseudo-C; analysis and covariance model (following Ref. [l8| ) are given in Appendix [El In the high signal to noise 
regime the hybrid estimator uses an approximate inverse-noise weighted map with sky cut. As shown in Fig. [5] this is 
highly anisotropic. This inverse-noise weighted map is combined with a uniform-weighted map to give C\ estimators 
that are fairly close to optimal on all scales with I > 30. For our simple test we assume a noise level average equiv- 
alent to the number for the 143Ghz channel quoted in the Planck science case 13]. We take the polarization and 
temperature pixel noise to be uncorrelated and proportional, with the polarization noise a factor of four larger than 
the temperature. 

We use the range 30 < I < 2000 for test parameter estimation from simulations; the low I likelihood is problematic 
because the Pseudo-C/ estimators are not guaranteed to be positive definite, and the covariance structure becomes 
complicated due to E/B mixing effects on the cut-sky. It may be possible to obtain reliable results from the Ci directly 



1 http://lambda.gsfc.nasa.gov/ 
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FIG. 5: Comparison between various binned and un-binned likelihood approximations. The left plot shows the average of the 
difference between the posterior amplitudes of these likelihoods and the right plot shows the variance, both compared to the 
systematic error (red (solid) line). The black (dashed), the green (long-dashed), the cyan (dashed long-dashed) and the blue 
(dotted dashed) lines represent the comparison between binned fiducial Gaussian and new likelihood, binned new likelihood 
and new likelihood, fiducial Gaussian and binned new likelihood and binned fiducial Gaussian and binned new likelihood, 
respectively. Averages were taken over 200 simulations (realizations) for Z max = 800. Simulations were performed as previously 
mentioned. Results are all consistent to the required accuracy. 



(without inverting to the unbiased estimators), using maximum-likelihood or other more optimal estimators, however 
at low I the likelihood function can also be calculated essentially exactly in reasonable computational time, so here 
we focus on the higher I region where an exact analysis is intractable. Investigation of the low I likelihood function 
for Planck-like noise, how to combine with higher-Z approximations, and dealing with real-world complications such 
as foregrounds is beyond the scope of this paper. 

From the simulated Cj-estimators we calculate the likelihood function of a given theoretical model using a likelihood 
approximation. This is used in the CosmoMC 8 parameter estimation code to sample from the posterior parameter 
distribution [14j. For our tests we consider a vanilla adiabatic flat A-CDM model, with baryon density fi^/i 2 , dark 



7 If only temperature is used then the new likelihood approximation works reliably with Pseudo-C; estimators down to I = 2 in almost 
all realizations. 

8 http://cosmologist.info/cosmomc/ new CMB likelihood module at http://cosmologist.info/cosmomc/CMBLike.html 
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FIG. 6: Similar comparison as in Fig. 6 but using the symmetric Gaussian approximation Cs- Unlike the binned and un-binned 
fiducial Gaussian, the binned and un-binned symmetric Gaussian approximations £s show significant bias. Averages are over 
100 simulations (realizations) for Z max = 1000. 



matter density fl c h 2 , amplitude, spectral index and running of the primordial power spectrum (A s , n s and n run ), and 
the parameter 9, 100 times an approximation of the ratio of the sound horizon to the angular diameter distance at 
recombination. The age, Hubble parameter (l?okms _1 Mpc _ ) and matter density relative to critical f2 m are derived 
parameters. Since we are only considering the likelihood at I > 30 we fix the optical depth to reionization; our 
simulated parameter constraints are therefore tighter than expected from a full realistic analysis. 

Figure [5] shows the consistent marginalized parameter constraints obtained when using the new likelihood approx- 
imation to analyse a set of sky simulations. Very similar constraints are obtained whether noise-dominated B power 
spectrum estimators are included or not, at least when there are no tensor modes. The new likelihood approximation 
seems to work well with realistically anisotropic noise. 

Since in reality we will not know a priori exactly what fiducial model to choose, it is important that results be robust 
to choosing a slightly wrong model. Figure [10] compares the results from one simulation using the new likelihood 
approximation compared to using the fiducial-model Gaussian approximation; the fiducial models have n s = 1 (wrong) 
and n s ~ 0.955 (true), a difference of many sigma at Planck sensitivity. All the results are broadly consistent, but the 
fiducial-model Gaussian approximation shows some dependence on the choice of fiducial model. The new likelihood 
approximation results are more independent of the choice of fiducial model, and so appear to be more robust as 
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expected 9 . The values of the goodness-of-fit parameter xte ( see Appendix [B| are also much more stable for the new 
approximation compared to the fiducial Gaussian; the new likelihood approximation best-fits differ by A%^ ff ~ 4, but 
the fiducial-model Gaussian approximations differ by A%^ ff ~ 400. With a fiducial model chosen to be sensibly closer 
to the maximum likelihood model both numbers should be significantly smaller. 

Although detailed analysis of secondary signals is beyond the scope of this paper, in the Appendix IE 41 we show 
that with Planck noise levels our likelihood approximations also work when applied to lensed CMB fields and the 
covariance is estimated simply by using the lensed power spectra. 



VI. CONCLUSIONS 



In this paper we have attempted to find solutions to the problems facing the likelihood analysis of the CMB 
temperature and polarization estimators on small scales. With realistic data we need to be able to calculate the 
likelihood accurately from partial sky observations. Previous attempts have established some excellent approximations 
to model the non-Gaussianity of the temperature likelihood function. However, no good general approximation has 
been derived to model the polarized likelihood. At large I computing the likelihood function exactly is computationally 
prohibitive and the correlation between the temperature and polarization fields makes it more complicated than for 
the temperature field only. We gave a new general approximation that can account for this correlation and is exact 
on the full-sky. This new approximation is fast to evaluate as it involves a pre-computed covariance independent 



9 Some of the difference here is due to changes in the hybrid pseudo-C; estimators when the fiducial model is changed. 
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FIG. 8: Smoothed regularized inverse-noise weight map with WMAP kp2 cut as used by our test Plank-like simulation analysis. 
Noise is lowest in the cuspy regions around the ecliptic poles. The cut gives zero weight to regions around the galactic plane 
and numerous point sources. Noise and cut are smoothed with a 7arcmin-fwhm Gaussian. 
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FIG. 9: Parameter constraints from six idealized Planck-like single map simulations with anisotropic noise as described in 
the text. The 1-dimensional marginalized posteriors are from using the new likelihood approximation with hybrid Pseudo-C; 
temperature, E-polarization and cross-correlation estimators at I > 30. The optical depth was fixed, and the simulation input 
parameters are shown with vertical lines. Very similar results are obtained if the noise-dominated B-polarization estimators 
are included with no tensor modes. 
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FIG. 10: Parameter constraints from a single idealized Planck-like simulations with anisotropic noise. The 1-dimensional 
marginalized posteriors are from using the new likelihood and the fiducial Gaussian approximations, and compare the results 
obtained when assuming an exactly correct fiducial model or using a wrong n a = 1 model. The red (dotted) line is the new 
likelihood with the right model, the black (solid) line is new likelihood with the wrong model, which agree very well. The green 
(short-dashed) line is the fiducial Gaussian with the right model and the blue (dotted-short dashed) line is the fiducial Gaussian 
with the wrong model. The new likelihood results are consistent but the fiducial Gaussian results are slightly affected by the 
choice of the model. 

of Ci, and appears to be more than adequate to obtain robust parameter constraints from clean small-scale CMB 
temperature and polarization data. 

In summary, our conclusions regarding the modelling of the likelihood function of power spectrum estimators are: 

• In the case of binned power spectra, the number of modes per bin (rim/rib) must be much larger than the 
number of bins (n&) for non-Gaussian corrections to the likelihood function to be unimportant in all cases; i.e. 
n-b -C y/n m is required to ensure that parameter bias is much smaller than the error bar. 

• A Gaussian approximation with fixed fiducial-model covariance gives unbiased results for smooth power spectra 
at high I, but error bars have some dependence on the choice of the fiducial model. Goodness-of-fit estimators 
XcB can be misleading even for small differences between the fiducial and true model. 

• A Gaussian approximation with covariance that varies with parameters can give reliable results at high I for 
smooth spectra, but only if the determinant-term is consistently included; the quadratic approximation without 
determinant, £q, is biased in general. 

• The new likelihood approximation presented in Section [ill El appears to work well for power spectrum estimators 
with correlated fields and can give nearly optimal results when applied to good power spectrum estimators. It 
is fast to evaluate as it relies on a pre-computed fiducial-covariance matrix, but is insensitive to small errors in 
the fiducial model. We recommend it for future work. 



22 



• Most likelihood approximations with binned estimators (n& <C \fn m ) can produce consistent results by the 
central limit theorem; for smooth power spectra consistency of parameter constraints with those from binned 
power spectra is a good check. 

Since the new likelihood approximation is based on estimators and a covariance matrix, it is likely to generalize 
well to more realistic data where additional uncertainties, non-Gaussianities and correlations can be accounted for 
via changes to the estimator covariance. It is also likely to produce good results down to low I if positive-dehnitc 
estimators are used, though this has not been the focus of this paper. Complications such as correlated noise may be 
well encapsulated in the covariance of a set of maximum-likelihood (or similar) estimators, giving a fast alternative to 
much slower brute-force likelihood calculations. If the approximation is nearly correct, importance sampling techniques 
could be used to correct the results with a much smaller number of high-accuracy calculations. 

We have not touched at all on the complications of foreground modelling, point sources, non-linear and non-Gaussian 
anisotropies (e.g. due to SZ), beam uncertainties, or a plethora of other real-world complications. Extending our 
work to account for these will be crucial for the correct interpretation of future data. 
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Appendix A: Useful results for matrix vectorization 



In this appendix we review some results from matrix theory relating equations involving matrices to those involving 
vectors of their components, and establish Eq. (fTU)) in the main text. For further details and references see e.g. 
Ref. Q. 

The elements of a general matrix A can be assigned column-wise into a vector vec(A). For matrices A and B 



Tt[A t B] = vec(A) T vec(B). 
The Kronecker product of an m x n matrix A with and p x q matrix B is defined to be the mp x nq matrix 



B 



/A n B A 12 B ... A ln B\ 
A 21 B A 22 B ... A 2n B 



Using this we can write 

and using Eq. (IA1|) this implies 



\A m iB A m2 B ... A mn B) 
vec(ABC) = (C T <g> A)vec(B), 
Tr [A T DEF] = vec(A) T (F T <g> D)vec(^). 



(Al) 



(A2) 



(A3) 



(A4) 



For a symmetric nxn matrix there are only n(n+l)/2 distinct elements, and we define vecp(A) to be the corresponding 
vector of distinct components of A 



vecp(A) = (An, A 2 i, ... A„i, A 22 , A 32 , . . .) . 
The matrix n 2 x n(n + l)/2 matrix B n is defined so that for a general square matrix A 

vecp(A) = Bjvec(A) = B^vec(A + A T )/2. 



(A5) 



(A6) 



For example, a 2 x 2 matrix A has B^vec(A) = (Ai, (Al2 + -^-2i)/2, A 22 ) . The pseudo-inverse = 
(BTB n )~ 1 B^ can be used to construct vec(A) from vecp(A) when A is symmetric: 



vec(A) = (B+) J vecp(A) 



(A7) 



23 



Applying Eq. (|A4I) to symmetric matrices A and D we then have 

Tr [ACDE] = vecp(A) T B+ (E <E> C)(B+) T vecp(D). (A8) 

Using the results that (A ® B) _1 = A^ 1 ® B 1 (for non-singular matrices) and B n B+(C ® C) = (C ® C)B„B+ it 
follows that Bj(C ® C)B„B+(C- 1 ® C~ 1 )(B+) T = J and hence 

Tr [i4C _1 £>C _1 ] = vecp(A) T [B^(C ® C)B„] _1 vecp(B). (A9) 

The C; covariance matrix of Eq. (|9|) is defined by 

Mi = (vcc P (C ; - C i )vecp(C i - C,) T ) = Bj (vec(C, - C;)vec(Q - C ; ) T )B„, (A10) 

where since Ci = J2 m a lm^i m / (21 + 1) we have 

vec(Q ) = ^-j- a '™ ® a im • ( A1 1 ) 

m 

Using the general results (for appropriately sized matrices) that (A <E) B)(C <E) D) = (AC) <E> (BD) and (A ® B) T = 
A T (g) B T gives 

vec(C;)vec(C/) T = T^pTp H( a '™ a *m)( a L' ® a ^') 

mm' 

= , 2l 1 1)2 E ( a '^ a L) ® ( a ^ a L)*- (A12) 

mm' 

Hence since (aj m aj m/ ) = 5 mm /Ci we have 

(vec(C, - C,)vec(C, - C,) T ) = ® ( Al3 ) 



so that Mi = 2Bl(Ci <g) Ci)B n /(2l + 1). Then from Eq. ([Ml) we have 

Tr [AC^DCf 1 ] = ^- T vecp(A) T M- 1 vecp(B), (A14) 
establishing Eq. (fT0|) . As a special case 

vecp(C i ) T Mr 1 vecp(C ; ) = ( 2 ' + 1 )" . (A15) 
If C has eigenvectors {e^} with eigenvalues {A^} then 

(C ® B)(e, c ® e?) = (Ce t c ) ® (DeJ) - A?A$(e? ® e*), (A16) 

so the determinant is \C®D\ = A|A^ = |C|"|B|™. Also for symmetric C, so that C ® C = (C ® C)B n B+, we 
have 

Bj(C ® C)B„B+(e 4 <» e,-) = [Bj(C ® C)B„(BjB„)- 1 ]Bj(e 4 ® e,) = A i A j B^(e i ® ej ). (A17) 
So there are n(ra + l)/2 distinct eigenvectors Bj l (e i ® ej) of [Bj(C ® C)B n (B^ B„) _1 ], and hence 

^^(c^^b^b^)- 1 ! =niI A ^ = i c r +1 - ( A18 ) 

The matrix B^B n is diagonal with n unit entries and n(n + l)/2 — n = n.(n — l)/2 that are a half, so |B^B„| = 

2 -n(n-l)/2 and hence 

\Bl(C®C)B n \ = 2-"("- 1 )/ 2 |Q|" +1 . (A19) 
The covariance matrix therefore has determinant 
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Appendix B: Full-sky goodness of fit 

Often people like to quote a chi-squared value as a crude measure of how well the data fit a given model. In the 
context of the full-sky CMB, where the a; m are Gaussian, we could define 

X 2 ^^(2? + l)Tr[c i Cr 1 ] (Bl) 
i 

so that P({az m }|C) oc e~ x I" 1 . This is minimized {\ 2 = 0) when the a; m take their maximum likelihood values (zero). 
The mean is (% 2 ) = 5Z Z (2Z + l)n and variance J^i 2 (2Z + l)n. 

Alternatively, we could define an 'effective' chi-squared, measuring the goodness of fit of the {C/} to {C{\ [5]: 



Xeff = 



21n(P({C ; }|{C ; }) = ^(2; + l){Tr[c ; Cr 1 ] - log \CiC^\ - n} (B2) 



(to within a C/-independent constant). This is normalized so that if C\ = Ci then x 2 g = 0. To assess the goodness of 
fit we could compare XcS to the value expected if C/ were the true model. The expectation value under the Wishart 
distribution can be calculated by performing a Cholesky decomposition into a lower triangular matrix L, where 
C t 1 ^ 2 CiC l = LL T , and using the independence of Lij (the off-diagonal elements being Gaussian distributed, the 
diagonal elements chi-squared) [12J. The result is 



(Xeff) = £(21 + !) y 11 ^ 1 + V2) - $>(* + 1 - </2)| , (B3) 
where ip(x) = d(lnT(a;))/dx. For I 3> n we have 

(21 + 1) {nln(l + 1/2) - £ ^ + 1 - i/2)} = + y ^^T^ + (B4) 
so for a large range of I with n -C i m in < i < l max we have 

(Xcff) « (^max - U„ + l) n ^ 2 + - + ^"( 2 " 2 + 3n - 1) ln(Z max /Z min ). (B5) 

The first term is just what we would expect for a Gaussian distribution in X/, the n(n + l)/2 distinct components 

Ci. The second term is the logarithmic leading-order correction. For Z m i n = 30, Z max = 2000 it is ~ 0.7 (for n = 1), 
~ 4.6 (for n = 2) and ~ 13.7 (for n — 3). The variance can be calculated similarly, giving 

var( Xc 2 ff ) = £(2i + l) j(2i + l)^'(i + l-t/2)-2nj (B6) 

f , v 1 n(2n 2 + 3ri - 1) ^, .-."I . , 

= ^| n (n + l) + - V ^ +1 >- + 0(l/l 2 )j (B7) 

» 2 (Xcff> + J^ n ( 2n2 + 3 " - !) ln(WAmm), (B8) 

where the prime denotes the derivative. 

Note that even on the full-sky CMB lensing and other secondaries would give a non-zero connected four-point 
function that would change the variance of the C; from that calculated here for Gaussian fields. 



Appendix C: Multiple maps 

In realistic experiments there are often many maps at different frequencies, from different detectors, and/or from 
different observation periods. Often the noise on these maps can be taken to be independent to an excellent approx- 
imation. Here we consider the very simple case where each map has isotropic noise. If there are two maps and 
a lmi eacn containing sky signal plus noise, the difference map — will be independent of the signal. With n 
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maps, there are therefore n — 1 linear combinations that do no depend on the signal, and hence can be integrated 
out of the likelihood function. The remaining uncorrelated linear combination is the inverse-noise weighted combined 
map 

Jt) _ 2^i=l\ 1Sl l ) a lm 

A similar argument applies in real space with anisotropic noise. The combined map {d;™} is a sufficient statistic for 
the likelihood function, and the likelihood analysis could therefore be based on Ci estimators from the combined map 
af^l . Alternatively we could consider estimating a set of Ci from all possible combinations of maps 

rn 

In the simple case considered above, the optimal linear combination of the is oc Yliji-^i ■ ) — 1 (N^)~ X C\ , and 

using this would be equivalent to using the estimator from the combined map a^. The likelihood approximations 
approximations in the main text could be applied directly to realistic pseudo-C; generalizations of this estimator. 

An alternative is to use only the off-diagonal correlations, where i ^ j [291 ] . In the simplest case we can define the 
optimal weighted combination 

ZijiNh-HNh-Hl-Sij) 

Since (C° ) = C\ the estimator is an unbiased estimator of the Ci regardless of the noise. In some instances it might 
therefore be more robust than including the diagonal correlations, where an error in the noise model can lead to an 
immediate bias in the estimator. However this estimator is no longer equivalent to a the estimator on the weighted 

map CLj^, and has a different distribution. In particular it is not positive definite. If C° ff are to be used for parameter 
estimation, in principle it may therefore be necessary to use a different likelihood approximation from those designed 
for analysing Wishart-like distributions. 

To see how different the distribution is we consider the very simplest case of foreground-free full-sky maps where 
all the maps have identical isotropic noise — iVj, and we consider only a single scalar field (no polarization). We 
can define a n-dimensional vector of a,!^, a/ m . The estimator is then 

^ ^^^-^ ^(eet-lla,., (C4) 

where e is a vector of ones, e,[ — 1. The covariance of the a^ m is given by 

Mi = (a, m ajj = C ; eet + Nil. (C5) 
The distribution of the Cf s is then given by 



P{C?*\C h Ni) 



J da im P(a Jm |M0<5 (df - a ln af m (ee* - /) a /m ^ 

i r°° e -^cf c 

= 7T- / dk FZTT^ (C6) 

^J-oo \I - 2ika ln M l (eel - I)\ l+1/2 

where the last line follows from writing the (5-function as a Fourier transform and c^" 1 = (21 + l)n(n — 1). Substituting 
for Mi and using \I + aee T | = I + na, the characteristic function (Fourier transform of the distribution function) is 
therefore given by 

P(k\Q,N l ) = -— . (C7) 

[(I + 2ikai n N l ) n - 1 (l - 2ik(2l + l)- x {Ci + A,/n))] i+1/2 

The quantity Ci + Ni/n = Ci + N^ is just the expectation value of Cj from the optimal map. The distribution of 
C° s is therefore the same as that of the variable cf^ — Yll=\ ^Vj /( n — 1)> where is the estimator from one 
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of n — 1 independent realizations of the noise. In the limit of many maps, n 
we have 



oo keeping the total noise fixed 



hm P(Cf\C l ,N l ) = ±- 

n—too Z7T 



dk— 

oo [1 



2ik(2l + ly^Ci +iV/ t) )]'+ 1 / 



(C8) 



This evaluates to the exact full-sky likelihood for CY 1 ' , so asymptotically with many maps C° tt + iVj has the same 
distribution as c[ , and hence the likelihood can be approximated using the same approximations. 

The distribution of Cf s can be calculated analytically for the special case n — 2 (as for the marginal distribution 
of Cj E @), but usually the off-diagonal estimator would be used only when there are several maps. In general the 



moments and cumulants of the distribution of G° can be calculated from the characteristic function, since 



) d p P(k) 
dkP 



fc=0 



In particular we have 



«x = (cf) = a 

n 2 = {{Cf-Qf) 



21 + 1 



(c l + N™y + 



: _ p dP log P(k) 
dkP 



k=0 



(n - 1) 



«3 



((Cf - Ci) 3 ) 

2r- 1 ( P ~iy. I 

(2i + l)P- 1 \ 



(21 + 1)2 

r Af( t )\p 

(d + N^f + (-If y 1 ' 



(n-l)P- 1 I ' 



(C9) 

(CIO) 
(Cll) 

(C12) 

(C13) 



The terms involving (Ci + N^) are the equivalent results for C, . The distribution of C° s is therefore slightly less 
skewed than for the optimal estimator, but (as expected) with a slightly broader distribution. The third and higher 

moments will be close to those for C; if n 3> 1 + AT; /(iVj +Ci). We therefore anticipate that if there are enough 
maps that this criterion is satisfied, n>2, the likelihood approximations presented in this paper should also work 
well using the estimator C° s + iVj . 

Note that even though C° s is unbiased regardless of the noise, the posterior mean of C; will depend on the noise, 
and there could therefore be a posterior bias on parameters even if there is no bias directly on the estimators. This 
bias due to noise error is however suppressed by a factor of ~ l/l compared the direct bias that would arise from 
using with an incorrect noise model. 



Appendix D: Cut-sky estimators, covariance and exact likelihood 



1. Calculating the CMB cut-sky estimators 



For limited sky coverage the temperature field is observed over only part of the sky. For full-sky observations part 
of the sky is likely to be dominated by galactic foregrounds, and CMB observations are effectively only available over 
the region of the sky outside a galactic (and point source) cut. In addition noise properties are generally not uniform 
across the sky; indeed a cut-sky can be thought of a full-sky observation with infinite noise in the cut region. For 
these reasons it is useful to define a weighted temperature field T given by 

f (si) = w T (n)T(n), (di) 

where W T is a weighting function defined over the whole sky that lies in the range to 1. The simplest weighting 
function is zero in the cut region and one in the region with useful data; however more general window functions can 
be useful to obtain more optimal estimators. The pseudo-harmonics af m are then defined by the spherical harmonic 
transform of T(Sl). They are related to the underlying un- weighted full-sky coefficients by 

I'm' 
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where the harmonic window function is defined as 



dnw r (n)y z . m /(n)y^(n) 



This can also be expressed as [22 



£ 



W. 



I" m" 



(2/ + l)(2/' + l)(2r' + l)\ 1/2 



47T 



(-1)' 



I V I" 




I V I" 



(D3) 



with the spherical harmonic transform coefficient of the window function given by 



WW, 



w T (n)Y^(n)dn. 



Similarly, for the polarization field the cut-sky pseudo-harmonic modes can be expanded as (see for example |17| ) 

(D4) 



I'm' 

af m = Y,(+Ww m ' a vm> - i-Wff m 'a$ m ,). 



I'm' 



Here 



with the spin weighted harmonic window function for spin s = ±2 given by 



(D5) 

(D6) 
(D7) 

(D8) 



where s Yi m (Q) are the spin- weighted harmonic functions. For azimuthal cuts the coupling matrices are diagonal in 
m, so W[{} m — 5 mm >W[{!, and they can be calculated quickly using a set of recursion relations (l7j . 
The pseudo-C/ power spectra are defined by 



Cl =21 



QBE 



— £ ®Tm(®lm)* Cj E — ^ - £ aL(S 
m m 

1 \ " ~E i~E \* fiBB _ 1 \ "* ~B i 

/ , a lm\ a lm) °i = 21 \ 1 



"im) 



(D9) 



2/ + l^ ,mv ,m/ 1 ~ 2/ 

m 

Their expectation values are related to the full-sky power spectra via the relation 
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(D10) 



V Mff Mff, J V Cr J 



where the coupling matrices are [34 



Ml? 



TE 



21 



TT\ 



Mil 



M EE 



M EB 



L_J2\W% m '\ 2 = (2l' + l)Z TT (l,l', 

mm' 

T^T) E \wlr i '\ + wiv Lm ' ] )\ = (2/' + l)H TE (M' ) 



W 



(2/ 



~Yj E l( + ^ ( r°)l 2 = (2l' + l)E EE (l,l',W pp ) 

mm' 

£ |(.^ W) )| 2 = (21' + l)E EB (hl',W PP )- 



(21 + 1) 



(Dll) 
(D12) 
(D13) 
(D14) 
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The window function enters via its power spectrum W^ Y given by 



21 



—y 



, , x , , y * 



(D15) 



and X and Y being either T or P. For isotropic noise tests we only consider ui^ = (jjj m . The symmetric S-matrices 
are defined by 



4tt 







(2Z 3 



8 a 



-^(i + C-i^fo 1 o 2 o 3 



h h h 
-2 2 



E EE (h,l 2 ,W) = Y, 



(2?3 + 1 W + (-m 



16tt 



i. , 2 / h h h 
-2 2 



E EB (h,l 2 ,W) 



^-^ l07T 



(-1) L ) 2 



-2 2 



(D16) 



for L = l\ + I2 + h- All other coupling matrices are zero. 

Provided that the sky cut is small (the usable region is larger than half the sky), the coup ling matrix in Eq (|D10[) 
is invertible and pseudo-C; estimators for the power spectrum are given by (see for example jl8l. [28j) 
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C EE 
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(D17) 



The estimators are unbiased, (Ci) — C\. When the observed area is small the matrix is not invertible. In this case the 
Ci can be binned into bands to construct band-power estimates of the power spectrum [22| in an analogous manner. 
Here we shall focus on nearly full-sky observations such as expected from the Planck satellite where estimates can be 
obtained for each C\ individually. 

Unlike in the full-sky case, the exact cut-sky likelihood function cannot be written purely in terms of a set of 
pseudo-C; estimators, so the compression of the observed data to the estimators is not lossless. However it can be a 
good approximation, and the estimators are convenient because the correlations between the C\ induced by the sky 
cut are accounted for easily. 



Covariance matrix 



The covariance matrix of the Cf T is given by 



(2J+1)(2I' + 1) 



mm' l\m\ l^m^, 



(Dig) 



As suggested by Ref . [l8[ , this expression of the Cj T covariance matrix may be simplified for the case of a narrow 
Galactic cut. In this case, Cf ± T and C^ T can be replaced with Cj T and Cf T , respectively and then by applying the 
completeness relation for spherical harmonics (35l j , the temperature Ci 's covariance matrix would be given by 



(AC; AC;/ ) = 2Cf T Cf r 3 TT (l,l',W IT ) 



(D19) 
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The covariance matrix of the C; -estimators is then given by 



(D20) 



Unfortunately, the other covariances do not simplify as easily since the completeness relation works only for the 
spherical harmonics with similar spin. For our azimuthal tests we use W T (£l) that takes values 1 or and approximate 
the pseudo-covariances by the following 



(AC ; TT AC,T T 

(ACf E ACf E 
(AC EE AC EE 



(AC BB AC BB 



(ACf T AC EE 



(ACrAC™ 



(ACf E ACZ E 



(AC EE AC BB 



r<TT r<TT 
2 1,. l [. M TT 



(21' + 1) 



CjCj,C E C E 



(2/' + l) 

qEEqEE 



TE 



qTEqTE 



(2Z' + 1) 

qBBqBB 



Mfir + 2 



(2V + 1) 

(jBBfjBB 



Mi 



TT 



v M BB 



(2i' + l) 
2 (21' + 1) Mlv ' 



(21' + 1) 

(jEEfjEE 

(2V + I) 



AC 



EB 
W ) 



" A^, 



CJ T CJ, T (CJ E + CJ E )MJ V T 
(2/' + 1) 

ylc? E C* E {C? E + C% E )Mli 
(21' + 1) 



7'E 



(JEEQEE _|_ (jBBfjBB 

2(2/' + 1) 



-M EB . 



(D21) 

(D22) 
(D23) 
(D24) 
(D25) 

(D26) 

(D27) 

(D28) 



Note that in the presence of isotropic noise the C; here include the noise contribution. 

At high I one can approximate Mj E — -M EE = Aif B = -Mf^, since the spin ±2 harmonics become close to the 
spin zero ones. Note our approximations in Eqs. (|D26j) . (|D27|) differ from those in Ref. [24j: since the Cf E can be 
negative we require consistency with the exact result on the full-sky rather than forcing these terms to be positive. 
Also, note the difference in Eqs. (|D23|) . (|D24[) from those in Ref. [24]]. More general results applicable with anisotropic 
noise and general weight function are given in Appendix [El More accurate results accounting for the complications of 
E-B mixing are given in Ref. [3(|; see also Ref. (2(|. Note that inaccuracies in the covariance matrix generally only 
affect the error bars; to this extent accuracy is less crucial than getting the estimators or likelihood function accurate, 
since there an inaccuracy could introduce biases. 

The covariance of the Cj-estimators can be calculated from the C\ covariance using the relevant coupling matrices. 



3. Exact likelihood for temperature and polarization 

Although an exact likelihood calculation is prohibitively slow in general, for azimuthal sky cuts the relevant matrices 
are block-diagonal in m and the calculation is numerically tractable. For the special case of azimuthal cuts we can 
therefore test cut-sky likelihood approximations against the exact result. 

For each m we can define a vector of pseudo-harmonic coefficients 
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which can simply be written as 
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(D29) 



X = diag[^, ( , m) , 



iW, 



(rn) 



(D30) 
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For Gaussian fields X is just a linear combination of Gaussian harmonics, and hence also Gaussian. However due to 
the sky cut the coupling matrix is not directly invertible, as the VF-matrices will have eigenvalues very close to zero 
(corresponding to modes localized in the un-observed region). However we can use a singular value decomposition 
(SVD) to isolate the observable independent modes following Ref. [HI, We diagonalize the transformation matrix 

as diag^W^T^, 2W^\ -^W^J 1 ^ = UDU^ and define new linear combinations: 

x , = xj-Vatffx = .D^C/tX. (D31) 

Here D denotes the smaller square matrix obtained from D by deleting nearly-zero rows and columns. U is the 
corresponding rectangular matrix obtained from U by deleting the corresponding columns. 
The signal correlation is: 

S = (X'X't) = £> 1 / 2 [/ t (XX t )[/ J D 1 / 2 

(CJ T CJ E Cj E \ (D32) 

= D 1 ' 2 U^ \ C? E C EE + C BB C EE -C BB \UD 1 ' 2 . V ' 

\ (jTE (jEE _ (jBB (jEE _|_ (jBB J 

If the noise is isotropic and uncorrelated, this frame structure provides a diagonal noise correlation [l7j : 



(XjvX^) = <diag(W^ m) , 2+W^\ 2^W l \ , r } ) => N = (X' W X^) = a 2 ,diag(l, 2, 2), (D33) 

where we have considered ajj 2 = a 2 N and a E2 = a B2 = 2a 2 N for simulation purposes. 
Given that the signal and noise are Gaussian, the likelihood function is then given by 

rp P TP , s exphlX'tfS + JV)- 1 ^] 
£({C?, Cf , Q , Q S }|X') ex |<? + iV|V2 (° 34 ) 

The only approximation is in the choice of cutoff value for the SVD; for non-zero noise the result is insensitive to this 
choice as long as it is small. 



Appendix E: Anisotropic noise: estimators and test simulation 
1. Hybrid Pseudo-C; estimators with cross- weights 

We consider pixelized maps with anisotropic but uncorrelated pixel noise variance a 2 (in this section the Ci do not 
include noise). We generalize the hybrid Pseudo-C; method of Ref. [25[ slightly to include Pseudo-Q estimators from 
mixed weights, e.g. using a set of Pseudo-C/s 

1 = 21 + 1 ^ lm lm ' ^ ' 

rn 

where is defined using weight function w l . For each X and Y there are therefore n(n + l)/2 distinct estimators 
if X = Y, or n 2 if X ^ Y, where n is the number of weight functions. For high signal to noise the best weight 
function should be close to uniform to minimize cosmic variance, for low signal to noise it should be proportional 
to the inverse-noise to minimize the noise [l8j]. Combining results from two weight functions, one with uniform and 
one with inverse-noise weighting, is therefore perhaps the most natural choice, especially if the polarization noise is 
proportional to the temperature noise in each pixel as we assume for our test simulations. Including the cross-estimator 
between maps with different weight functions is particularly useful for estimating Cf E : since the polarization noise is 
much larger than the temperature, over a wide range of scales the cross-estimator between uniform and inverse-noise 
weighted maps is much better than using uniform/uniform or inverse- noise/inverse-noise. Even for the temperature 
case there is a range of scales in between noise and signal domination where the cross-estimator can be useful. Including 
more than two weighting functions seems to gain very little, so we use just two. 
The unbiased C; estimators are constructed using the coupling matrix 

Cf Y ^ = [M XY ' l %}C* Y < l \ (E2) 
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where 

Mf/> ij = {21' + 1)E XY {1, V, WV), W*> = ^ «*- (E3) 

m 

and the coupling matrices are defined in Eqs. (|D16|) . 

The noise contribution to the Pseudo-C; is given, for uncorrelated pixel noise (cj) 2 , {erf) 2 , (c^ 7 ) 2 and pixel area 
tt s , by 

^ T,ij = ^£(OM*x»fi2 ( E4 ) 

s 

= ffBB M = l_^ [{a ff + {a u f] w i (s)w i {s)Sl l (E5) 

with other combinations being zero. We then have {C^' 1 ^) = C* Y + J^,, [M XY, ^\7,} Nu ' lJ ■ 

From multiple Pseudo-C estimators with different weight functions one can either attempt to apply the likelihood 
approximations directly to the complete set of estimators, or one can compress into a single hybrid estimator. At low I 
it is likely to be beneficial to also include more optimal estimators than Pseudo-C , especially for the polarization [25| . 

A hybrid pseudo-C estimator can be constructed following Ref. [25j : this is defined by constructing the best-fit 
Cj to the multiple estimators by minimizing the Gaussian-approximation to the likelihood using the approximate full 
covariance. We do this separately for each temperature-polarization spectrum, so that the hybrid estimator is just a 
linear combination of the individual estimators rather than mixing estimators of different type. Since the polarization 
noise is higher than for the temperature, we consider cross-spectra of the form C^ E ' 1 -' where i > j, and the weight 
functions are ordered so that lower i are more optimal in the case of lower noise. We then have the same number of 
cross-weight spectra for each of the power spectra. Since the hybrid estimators are just linear combinations of the 
separate estimators, their covariance can easily be calculated from the coupling matrices and full covariance matrix 
approximations given below. When including C BB we impose a uniform weight function at I < 120 to minimize E/B 
mixing effects and ensure that the covariance matrix approximations below remain accurate. This is suboptimal but 
unbiased; we do not investigate the more difficult problem of optimally constraining the tensor amplitude here. 

2. Covariance matrix approximations 

Approximations for some components of the covariance matrices for the Pseudo-Cs were given in Ref. [25| for a 
general pixel-weighting function w(s) (pixels area f2 s ) and anisotropic but uncorrelated instrumental pixel noise (erj 1 ) 2 
and (c!?) 2 = (c^ 7 ) 2 . The approximations essentially make as many assumptions as necessary for the result to simplify 
to the forms given; the approximations should be reasonably accurate for small cuts at high I (where s Yi m ~ Yim) 
and noise-dominated B-polarization spectra. Here we summarize these results with slight generalization, and extend 
to include all the terms needed for the full polarized and correlated estimator covariance. We only consider the 
case of using Pseudo-C; estimators from single maps of T, Q and U with various weighting; the noise properties of 
cross-spectra between multiple maps with independent noise are a simple generalization. 

Assuming the polarization and temperature noise is uncorrelated, the covariance of the Pseudo-C; estimators can 
be estimated using the approximations (for I 1 and significant noise so that E-B mixing effects are small and large 

/sky): 

(AC^ACj 7 ^) « Cj T Cj, T [S TT (M', W^ti) +E TT (l,l\W( i «W p ')) 
+ {CjCj.fl 2 [~ TT {1, l', W^fcOCft)) + 5tt (Z, /', w 2T ^^) + E TT {1, 1', W 2T ^^) + E TT {1, 1', W 2T ^^) 

+ E T t{IJ',W tt(ip ^) +Hrr(M',^ TT(i,)0 ' p) ), (E6) 

+ E TE (l,l',W TQ ^W) + (CrC$ T ) 1/2 E TE (l,l',W 2Q W^) + {C EE C^ E ) l ' 2 E TE {l,l',W 2T ^^), (E7) 
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(ACf E ' ij AC EE ' pg ) w C EE C EE \e ee (1, I', + J', w^(w)Cjp)) 

+ (cf B c ; f B ) 1 / 2 \s EE {i, v, w^o(*p)Cj-9)) + ~ BB (z, r, w^ww) + /', w^px*)) + w^ow*) 

+ E EE (l, l', w QQ{ip){ J q) ) + E EE (l, I', W QQ ^ jp) ), (E8) 



( A C BB ' ij AC BB ' pq ) « C? B C$ B [h B b(J,J', W (ip)W,) ) +H B B(i,J', W (i9)(jp) ) 

+ {Cf B C BB fl 2 \Z EE (1, V, W*«(«OCj«)) + W" 2 O(*9)0p)) + Z', W^OCmOM) + ~ BB (Z, Z', W^OCiaXip) 

+ S BB (Z, r, W QQ{ip){ J q) ) + E EE (l, I', w QQ{iq){3p) ), (E9) 



(ACf E ' ij AC BB ' pg ) « [(Cf £ Cf i? ) 1 / 2 + (Cf B C* B ) 1 / 2 ] 2 I [ HBB (i, /', ^«p)Cifl)) + E EB (l, I', W™™) 

+{c EE c EE ) 1 ' 2 [5 Efl (i, f, ^ 2 «*)W) + h bs (;, ^^(^W) + s EB (i, i', w 2 «WM) + Seb(I) f , ^Qa,)*)) 

+ E EB (l, I', w QQ{ip){ ' jq) ) + E EB (l, I', wQQMUp)), (E10) 



(ACf T ' ij ACf E ' pq ) « ^(CTC^) 1 /^^ + Cj E ) [~ TT (l, V, W^^) + E TT (l, I', W™^) 

+ l -( C f E + Cj E ) [~ TT (l, l\ W^MM) + Ztt(1, l\ W^WM)] , (Ell) 



(AC? E ^ACf E ' pg ) « ^(C EE C EE )V 2 (C? E + Cj E ) [e ee (1, I', wMM) + E EE (l, V, W^Cw)) 

+ \{CT E + C? E ) [s ee (IJ\W 2Q ^^) + S ee (IJ\W 2Q ^^)] , (E12) 



(ACf T ' ij AC EE ' pg ) « Cj E Cj, E \Ztt{1, I', W^^) + E TT (l 7 1', W^^) 
where the various window functions appearing are determined by the power spectra 



1 21 + 1^ 



11 PQ * 
W lm W Z 



W, 



TT(ij)(pq) 



21 + 



_|_ 1 / \ W lm W lm )i VV l — VV l '21 + 



yy2T(ij)(pq) _ 1 W„ 



( W l 3 m W lm q *)> Wl 



~ Ar 2Q(ij)( P q) _ yy2U(ij)(pq) _ 1 \^C,„W)„„Q,PQ 



2/ 



(E13) 

(E14) 
(E15) 
(E16) 



1 \- < 

= 2i + lL«'« 



Q,ij w Q,pq* 



and the harmonic coefficients are given as sums over pixels with area fi s as 

5 S 



(E17) 

(E18) 
(E19) 
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At the level of approximation considered here ~E.ee ~ Ett ~ Ete, so there is some ambiguity in which particular 
form to use in the approximations. Note that the contribution of E to the Cf covariance is neglected, which is 
a poor approximation when the noise is not dominant; more accurate approximations are given in Ref. [30]. If the 
£?-polarization contribution to the variance becomes important relative to the noise, the non-Gaussianity of the lensed 
£?-polarization field also becomes an issue (see e.g. Ref. [6]). For Planck noise levels the i?-lensing signal is well below 
the noise and E/B mixing effects are also well below the noise at I > 100. 

The covariance matrix for the C; estimators is determined straightforwardly by applying the inverse coupling matrix 
to the above results. The covariance of the hybrid estimator is then just a contraction of the full multi-estimator 
covariance with the hybrid mixing matrix. 



3. Test simulations 



The diagonal of the covariance matrix approximations given above agree very well with simulations at I > 30 
if the weight map does not have too much small scale power. The covariance approximations are more sensitive 
to small scale power in the noise and weights than the coupling matrices; for this reason we use a smoother mask 
and noise map than is needed to obtain an accurate coupling matrix. This avoid numerical issues in our tests so 



that we can focus on any errors due to the likelihood approximations. We use a HEALPix 10 33] pixelization at 
iV s ide = 2048, upgrading the simulated Planck noise map [3l| and convolving it with 7arcmin Gaussian kernel so that 
it is smooth on this scale. For the mask we take the WMAP kp2 map, upgrade to iV S ide = 2048 (12 x 2048 2 pixels), 
smooth with 7arcmin kernel, set negative pixels to zero, and smooth again with a 7arcmin kernel. This gives point 
source cuts that still go to essentially zero, while having edges smoothly tapering to one. To calculate the Pseudo-C; 
estimators we take w 1 as uniform weighting (multiplied by the cut), and a regularized inverse-noise weighting given by 
w 2 (s) oc 1/ (er 2 + min(cr 2 )), smoothed with a 7arcmin kernel and then multiplied by the cut. We use the same weight 
functions for temperature and polarization, and take (if) 2 = (07 ) 2 = 4(er;T) 2 for simplicity. Gaussian simulations 
are done to l max = 2200 with zero monopole and dipole. The simulation code is available on the web 11 . 



4. Lensed simulation 

The largest non- linear effect on intermediate scales is expected to be that of CMB lensing (3(| • Detailed modelling 
of the non-Gaussian distribution induced by this effect is beyond the scope of this paper, however for Planck noise 
levels the non-Gaussianity can be neglected to good approximation when performing parameter analyses from the 
lensed CMB power spectra [3jJ. The effect of lensing on the power spectrum is many percent, and must be included 
to obtain correct parameters with Planck. We update the LensPix code [H| to quickly simulate high-resolution lensed 
maps accurately. Our simulation method is as follows: 1. we simulate a HEALPix map of a realization of the lensing 
deflection angle from a Gaussian realization of the lensing potential; 2. Divide the sphere into a number of slices 
separated by lines at constant polar angle 0, and assign each slice to a different processor (with some overlap given 
by the largest ^-deflection); 3. each processor simulates a Gaussian unlensed CMB map over its assigned slice on an 
equicylindrical grid; 4. interpolate from the equicylindrical grid to the deflected positions corresponding to the centre 
of HEALPix pixels offset by the deflection angles. Equations used for simulati ng g radient maps, deflecting points 
along geodesies, and appropriately rotating Stokes parameters are given in Ref. 38]. Our updated code is publicly 
available 12 . 

For our simulation we use .A s ide = 2048, and generate equicylindrical unlensed grids with points at 6144 different 9 
values (interp_factor = 1.5, effectively the same resolution as HEALPix at Aside = 2048). The number of 0-pixels is 
chosen for each slice to be of the form 2 ra 3 m (for integer n, m) so that FFTs can be performed quickly, with lowest 
spacing roughly the same as the spacing in 9. To interpolate we use an extended cubic interpolation algorithm 
TOMS760 [30]; this is significantly slower than a basic bicubic interpolation scheme, but more accurate and stable 
- it ensures our results converge as the number of equicylindrical pixels is increased. Averaged over simulations 
our simulated lensed CMB power spectra then agree at the 0.1%- level with theoretical expectations for the same 
! max [3{||4(J. Other simulation method are discussed in Refs. (41H44| . though non-linear evolution effects are minor at 
Planck noise levels. Since the unlensed CMB is not band limited but contains residual power at I > 2000 our method 



http : //www. eso . org/science/healpix/ 

http : //cosmologist . inf o/cosmomc/CMBLike . html 

http : //cosmologist . inf o/lenspix 
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FIG. 11: Simulated parameter constraints from eight lensed CMB realizations using the new likelihood approximation with E 
and T (and cross) hybrid pseudo-Ci estimators at 30 < I < 2000. The covariance was calculated using only the lensed power 
spectra. Input parameter values are marked with vertical lines, and the reionization optical depth was fixed. Including B 
estimators has virtually no effect at Planck noise levels. 

does not rely on band-limited interpolations and works directly with maps that contain power up to the highest 
simulated Z max . On a modern few-node cluster lensed maps with polarization can be simulated in a few minutes. 

Figure [11] shows parameter estimation constraints generated using a set of simulated lensed maps with Planck- like 
noise, and modelling the covariance as in the unlensed case simply by using the lensed power spectra instead of the 
unlensed ones. A more optimal analysis would use the non-Gaussian information in the lensed field to indirectly 
constrain the lensing potential and hence cosmological parameters (see e.g. references in [36jh though it is unclear 
how much can be gained in the presence of real-world complications. 
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